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A THREE-TERM DESCENT CONJUGATE GRADIENT 
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Abstract. A three-term descent conjugate gradient algorithm is presented. The algorithm 
is obtained by minimizing the two-parameter quadratic model of the objective function in 
which the symmetrical approximation of the Hessian matrix satisfies the general quasi- 
Newton equation. Using the general quasi-Newton equation the search direction includes 
a parameter @ which is determined by the formal equality between the search direction 
used in the suggested algorithmand the Newton direction. It is proved that the best value 
of this parameter is @=1. The direction satisfies both the descent and the conjugacy 
conditions. The new approximation of the minimum is obtained by the general Wolfe line 
search using by now a standard acceleration technique. Under standard assumptions, 
both for uniformly convex functions and for general nonlinear functions, the global 
convergence of the algorithmis proved. The numerical experiments using a collection of 
800 large-scale unconstrained optimization test problems of different complexity show 
that using these ingredients we get a search direction able to define a very efficient and 
robust three-term conjugate gradient algorithm. Numerical comparison of this algorithm 
versus well known conjugate gradient algorithms ASCALCG, CONMIN, AHYBRIDM, 
CG-DESCENT, THREECG and TICG as well as the limited memory quasi-Newton 
algorithm LBFGS (m=5) and the truncated Newton TN show that our algorithm is more 
efficient and more robust. 


Keywords: Large scale unconstrained optimization, Two parameters quadratic model, Generalized 
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1. Introduction 


For solving large-scale unconstrained optimization problems 


min f(x), (1.1) 


where f:R"—>R is a continuously differentiable function, supposed to be 


bounded from below, starting from an initial guess 0 eRe. a three-term 
conjugate gradient method we want to develop in this paper, generates the 


sequence {Xb as: 
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Xe =X FQ, (1.2) 
where @% >9 is obtained by line search, and the directions 4. are computed as: 


Dis =~ Bp FAS, FAY,» dy = So. (1.3) 


In (1.3), %% and % are known as three-term conjugate gradient parameters or 
coefficients. As usual 5 =%u-%» 8. =VAO) and Ye = 8c 8x: Observe 
that, the search direction %.1 is computed as a linear combination of ~ 8k» % 


and Ye» in which the coefficient of 8%+1is —1. The line search in the conjugate 
gradient algorithms is often based on the general Wolfe conditions [36, 37]: 


f% +4) - FG) S$ PGB.» (1.4) 
Sind, = 78,4, , (1.5) 


where % is a descent direction and 0< <o<1. However, for some conjugate 
gradient algorithms stronger version of the Wolfe line search conditions, given by 
(1.4) and 


gind,|<—ogid, (1.6) 
are needed to ensure the convergence and to enhance the stability. 


Different three-term conjugate gradient algorithms correspond to different choices 
for the scalar parameters a, and b,. In this context the papers by Beale [10], 
McGuire and Wolfe [21], Deng and Li [14] and Dai and Yuan [13], Nazareth [26], 
Zhang, Zhou and Li [38, 39], Zhang, Xiao and Wei [40], Al-Bayati and Sharif [1], 
Cheng [11], Narushima, Yabe and Ford [23], Andrei [7, 8, 9] present different 
versions of three-term conjugate gradient algorithms together with their 
properties, global convergence and numerical performances. All these three-term 
conjugate gradient algorithms are obtained by modification of classical conjugate 
gradient algorithms to satisfy the descent and in some cases the conjugacy 
conditions. Generally, these three-term conjugate gradient algorithms are more 
efficient and more robust than classical conjugate gradient algorithms by Hestenes 
and Stiefel [19] or by Fletcher and Reeves [16], Polak-Ribiére-Polyak [30, 31], 
Liu and Storey [20] or by Dai and Yuan [13], etc. 


In this paper we suggest another way to get three-term conjugate gradient 
algorithms by minimizing the two-parameters quadratic model of the function /.- 


The idea is to consider the quadratic approximation of the function f in the 
current point and to determine the search direction (1.3) by minimization of this 
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quadratic model subject to the parameters % and b,- Tt is assumed that the 
symmetrical approximation of the Hessian matrix satisfies the general quasi- 
Newton equation which depends by a positive parameter ©: The three-term 


conjugate gradient parameters % and b, are determined as solution of an 
algebraic system of two linear equation. Section 2 presents this idea, as well as the 
procedure for determination of the positive parameter ©: In order to determine a 
good value for the parameter © the formal equality between the search direction 
(1.3) and the best known direction given by the Newton direction is used. In 
section 3 the corresponding three-term conjugate gradient TTSCAL algorithm is 
presented into the context of acceleration of the iterations. Section 4 is dedicated 
to the global convergence analysis of the algorithm. It is shown that under the 
standard assumptions both for uniformly convex functions and for general 
functions the search direction is bounded. Section 5 includes the numerical results 
with TISCAL and some comparisons versus known conjugate gradient 
algorithms ASCALCG [2, 3], CONMIN [35], AHYBRIDM [6], CG-DESCENT 
[18], THREECG [9], TTCG [8] as well as versus LBFGS [27] and TN [24], ona 
collection of 800 large-scale unconstrained optimization test functions. It is shown 
that the three-term conjugate gradient algorithm TTSCAL corresponding to the 
minimization of the two-parameters quadratic model of the minimizing function 


fis more efficient and more robust then all these algorithms considered in this 
numerical study. 


The two-parameters quadratic model of function { minimization 
At the *~th iteration of the algorithm, let us assume that an inexact Wolfe line 
search was executed, that is the step-length Oe satisfying (1.4) and (1.5) was 
computed. With this value of “the following elements Se Xen % and 
Ye = 81 ~ 8k can be computed. Now, let us consider the following quadratic 
approximate of function Fin “et as: 
®p(d) = fd + Ad" Bad, 
2 (2.1) 
where Buss is a symmetrical and positive definite approximation of the Hessian 
V'F Ora) and @ js the direction which follows to be determined. The direction 
diss is computed as: 


Ay = 8p +45, +b y,; (2.2) 
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where the scalars % and % are determined as solution of the following 
minimizing problem: 
min ®,,,(d;.1) 


a,b, ER 


(2.3) 


Introducing di: from (2.2) in the minimizing problem (2.3), then the parameters 


4% ond b, are determined as solution of the following linear algebraic system: 
g y: 
a, (5; By Sp) +b, (5; Bead,) = Bs Biass a 5, pas (2.4a) 
(5; By rVe) +O. By Ye) = Ser BeYe — Ne Siar (2.4b) 


Suppose that the symmetrical and positive definite matrix 3... is an 


approximation of the Hessian Vf (Xu) such that Bes =O Yes with o>0, 
known as the general quasi-Newton equation. With this the linear algebraic 
system (2.4) becomes: 


a, (v7 5,) + Bly = Ve Susi — O% Bhar (2.5a) 


2. 
ayy, + 2,00) Bead.) = Op Bede — O; Sear (2.5b) 
In order to solve the lmear system (2.5) we must evaluate the quantities: 
22 6 _~T 
Te = Vi Bess Ve and G, = 8B e | Suppose that Beas is positive definite. Now, 


using the classical quasi-Newton equation 3y.15; = Ys we have: 


Lat = Vi Br V Si Bory Ge Beas.) 
Me = Ve Presi Ve = T 2 T 
(¥, By si5,) 5, By 5; 


1 1 1 1 


o yy Be Be Ve 5¢ BeBe) CHa 


11 T 
ia ms S 
Oe Be Beas) oer 
; 2) 4 2 
Bey, Bos 
Cos ee Ios 
1 1 e yrs, 1 OI) 
(B2,,y,)' (B2,8,)| To y's 
: : cos” < Bry Beis > OP, (2.6) 


1 1 
Since 4y.1is unknown, it follows that the quantity cos’ < Bey, Beas > in (2.6) 


is unknown. However, since the mean value of cos’ &=1/2, then in (2.6) it seems 
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1 1 
reasonable to replace the above quantity cos* < Be y,, Beas, > by 1/2. Therefore, 
7), can be computed as: 
N= 2 ey)” 
YeSk (2.7) 


Now, in order to compute A, we can use the BFGS update initialized with the 
identity matrix: 


T T 
yy SS 

0, = Sin Bee = e.] 7 s - : b 
VeSp Sy Sx 


(SYM) (Se Ie) 
Ved 5,8 (2.8) 


_ or 
= Baar 


It is worth saying that another way to compute ®. is to use the BFGS update 


2 
initialized, for example, from the scaling matrix (si ye)! [sx I. However, we are 
interested to use (2.8) in our algorithm. With these developments the linear 
algebraic system (2.5) becomes: 


a, (vfs, + bye = Ve Sent — OS; Bist (2.9a) 


ay ly: |’ + b,on, = 06, — ON. Skat: (2.9b) 


Now, using (2.7) observe that the determinant of the matrix of the linear system 
(2.9) is: 


A, = On, (5 )-OLY) =2o-Ny;,y,)° > 0 (2.10) 
if @>1/2 and of course x % 9- 


Supposing that A; > 0, then from the linear algebraic system (2.9) we get: 


1 2 
a, = fon (oF — OS; Siu) — oly; (8, - wre] (2.11) 
: 


1 2 
b, = S lots, = Vi Sex) — [lel (Y, Seat - as! g,,)| 
A, (2.12) 
Therefore, if A, > 0. ie. Ye #9 and O>1/ 2. then the search direction is 


computed as in (2.2), where the scalars “« and Db. are computed as in (2.11) and 
(2.12) respectively. 
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If the line search is exact, that is Si 8x1 =; then from (2.11) and (2.12) we have 


a) Ve Skt 

Gs 4 at. 
2@-1 y,S, (2.13) 

_ a-i ace 

k rr. . 
20-1 YS (2.14) 


Observe that if o=1, then “% = (Ne Se A HS) and 5, =9, Le. the search 
direction is computed as: 
T 
Ais =-Bia 4 TES tt 
YES (2.15) 


which is exactly the Hestenes and Stiefel conjugate gradient algorithm. 


Proposition 2.1. Suppose that Bri >9- Then des given by (2.2) where the 


b 


scalars “* and "« are computed as in (2.11) and (2.12) respectively is a descent 


direction. 


Proof. From (2.1) observe that ®,..0)=9. Since By >% and den given by 
(2.2), (2.11) and (2.12) is the solution of (2.3), it follows that Pe eu) $0. 
Therefore, 


1 
Bids = —= di Be dyss < 0, 
2 (2.16) 


: d : . : 
Le. ~*+! jg a descent direction. = 


Proposition 2.2. Suppose that the search direction dass is given by (2.2) where the 


b 


scalars “* and 7 satisfy the linear algebraic system (2.9). Then the direction 


T erry i 
dass satisfies the Dai-Liao conjugacy condition Ye Aes = OS, Biss with @ > 0. 

E 
Proof, Since “+! is given by (2.2) it follows that 7* 4 js given by (2.9a), which 
is exactly the Dai-Liao conjugacy condition [12]. m= 
Observe that our algorithm in which the search direction is given by (2.2) where 


the scalars “* and D are computed as in (2.11) and (2.12) respectively and the 
step-length is obtained by the Wolfe line search (1.4) and (1.5) is a conjugate 
gradient algorithm with three terms. Our three-term conjugate gradient algorithm 
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is based on the minimization of the quadratic approximation of the function f 
into the current point, in which the searching direction di.1ig selected as a linear 
combination of ~ 8i+1> 8. and Yx where the coefficient of Siis —!- 

Remark 2.1. Another possibility to define the search direction is to mtroduce in 
(2.2) ascalar or a matrix coefficient multiplying 8k+1 as: 


Ay = Ay S511 $+YS_ FON, ; (2.17) 


However, this is not as good as, for example, the limited memory quasi-Newton 
methods since if 4:.:contains enough useful information about the inverse 
Hessian of the function ff, we are better off using the search direction 


dis =—Ay8k11. The addition of the last two terms in (2.17) may prevent 41 
from being a descent direction, without saying anything about the definition of the 


symmetric and positive definite matrix Al... which requires several vectors, 
making the storage requirements similar to those of limited memory methods. 
This is the main reason we consider in our three-term conjugate gradient 
algorithm the search direction as in (2.2). 


Observe that in order to define the search direction (2.2) in (2.11) and (2.12) we 
must establish a procedure for computation of the parameter @. There are some 
possibilities, but in this paper we are interested to use the best search direction we 


know, i.e. the Newton direction. As a matter of fact, when the initial point *0 ig 
enough close to the local mmimum point x" then the best search direction to be 
used in the current pomt *k+: is the Newton direction Vet Ga) Bey: 
Therefore, our motivation is to select the coefficients % and Db. in (2.2) in such a 
manner that for every k21 the direction 4:1 is the best search direction we 


know, ie. the Newton direction. Hence, the parameter @ in “ and b, 
coefficients from (2.2) can be determined by the relation 


Beas: £O)¢=—V Ow) Spa (2.18) 
Introducing the algebraic expressions of % and b, from (2.11) and (2.12) 


respectively in (2.18), after some simple algebra we get the following quadratic 
equation: 


2 = 
2@° —30+1=0, (2.19) 


which admits the following solutions: @=1 and w=1/2. 
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The solution @=1/2 js not suitable (see the condition (2.10)). The only solution 
admitted is ©=!, Observe that the Newton direction is being used here only as a 
simple technical ingredient to compute a good value for the parameter @: 
TTSCAL algorithm 


In this section we present the algorithm TTSCAL where the search direction is 
given by (2.2) and the coefficients % and b, are computed as in (2.11) and (2.12) 
respectively with @=1. In our algorithm we use an acceleration scheme we have 


presented in [5]. Basically the acceleration scheme modifies the step length @% in 
a multiplicative manner to improve the reduction of the function values along the 
iterations. As in [5], in accelerated algorithm instead of (1.2) the new estimation 
of the mmimum point is computed as 


Xp =X +O, AA, , (3.1) 
a 
Si = 
where b, , (3.2) 


a, = 0,84, b, =-a, (8, —g.) d,, 8. =VA(Z) and Z=% + Gd, 


Hence, if %>9; then the new estimation of the solution is computed as 
Kg =X + 5G, ; Xp =X FHM, Observe that 


b, = 0,(8,- 8) 4 =a, (AV f (%)d,)s where * is a point on the line segment 


otherwise 


connecting “ and % Since % >9, it follows that for convex functions % 2 9. 


For uniformly convex functions, the linear convergence of the acceleration 
scheme is proved in [5]. 

Therefore, taking into consideration this acceleration scheme and using the 
definitions of §&«>%* and * the following three-term conjugate gradient 
algorithm can be presented. 


TTSCAL algorithm 


Step 1. | Select a starting point %»<€d0mf and compute: fo=S(%) and 
8 = Vf(%)- Select some positive values for ? and 7: Set d, =—8 
and k =0. Consider @=!. 

Step 2. | Test a criterion for stopping the iterations. If the test is satisfied, then 


stop; otherwise continue with step 3. 
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Step 3. | Determine the steplength @% using the Wolfe line search conditions 


(1.4) and (1.5). 
Step 4. Compute: <=%% +@,4,, 8. =VA(Z) and Ye = 8-8: 


Step 5. Compute: % =a,2,d, , and b, =-Q,Y, 4, 


Step © | Acceleration scheme. If 2 >0, then compute $_ =a / b, and update 


the variables as *x41 =x, +6,a,d, , otherwise update the variables as 


Xp = % +O, Compute Fer and Sku Compute Ys. = 8x78 and 
Si = Xe TX Ee 


Step 7. | ie yey, >, then compute 4% and % as in (2.11) and (2.12) 
respectively, otherwise set 4% = (Me See IMS) and % =0. 


Step 8. | Compute the search direction as: Ges =—Se $US + OMe. 


aa a Powell restart criterion. If lgia8e| >0.2||g,.i then set Ceu =—8eu. 


Step 10.| Consider k =k +1 and go to step 2. m 


a 
If f is bounded along the direction d, then there exists a stepsize “satisfying 
the Wolfe line search conditions (1.4) and (1.5). In our algorithm when the Powell 
restart condition is satisfied, then we restart the algorithm with the negative 


gradient ~§&x+1- Under reasonable assumptions, the Wolfe conditions and the 
Powell restart criterion are sufficient to prove the global convergence of the 
algorithm. The first trial of the step length crucially affects the practical behavior 


of the algorithm. At any iteration k>1 the starting guess for the step %« in the 


line search is computed as @x-1 qed 
selection of the starting guess in line search. 


This proves to be one of the best 


Convergence analysis 

Assume. that: 

(i) The level set S={xer' f OSL G)} is bounded, i.e. there exists 
x|| <B. 


positive constant B>O such that for all x €S, 


(ii) In aneighbourhood N of S the function f is continuously differentiable 
and its gradient is Lipschitz continuous, i.e. there exists a constant L>0O 


such that |Vf@)-VFO)|s Lle— yl, forall ~+ YEN. 
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Under these assumptions on /f there exists a constant [>0_ such that 


VF (x) <1 for all xe. Observe that the assumption that the function f is 
bounded below is weaker than the usual assumption that the level set is bounded. 


Although the search directions generated by (2.2), (2.11) and (2.12) are always 
descent directions, to ensure convergence of the algorithm we need to constrain 


the choice of the step-length %- The following proposition shows that the Wolfe 
line search always gives a lower bound for the step-length @:- 


Proposition 4.1. Suppose that d, is a descent direction and the gradient 
Vf satisfies the Lipschitz condition 


VF (x) — VF (x, )l] S$ LI], | 


for all x on the line segment connecting *« and *k11» where L is a positive 
constant. If the line search satisfies the Wolfe conditions (1.4) and (1.5), then 


 d-o)|sid,| 
a 
L\\d,|| (4.1) 
Proof. Subtracting gid, from both sides of (1.5) and using the Lipschitz 
continuity we get 


(o-l)gid, < (Suu —3g,) 4d, = yd, Ss elle <LI. 


Since 4 is a descent direction and & <1, (4.1) follows immediately. m= 


The following proposition proves that in the above three-term conjugate gradient 
method, under the general Wolfe line search (1.4) and (1.5), the Zoutendyk [41] 
condition holds. 

Proposition 4.2. Suppose that the assumptions (i) and (ii) hold. Consider the 
algorithm (1.2) and (2.2) with (2.11) and (2.12) where d,. is a descent direction 
and &, is computed by the general Wolfe line search (1.4) and (1.5). Then 


= |d,|| (4.2) 
Proof. From (1.4) and proposition 4.1 we get 


(l-o\(gid,) 


Li ha2 G8, GAP 
k k+1 kOk “kK L|d,|f 
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Therefore, from assumption (i) we get the Zoutendyk condition (4.2). = 
In [32] Powell proved that in conjugate gradient algorithms the iteration can fail, 
in the sense that 8:27 >0 for all k, only if 4] > © sufficiently rapidly. 


More exactly, the sequence of gradient norms ls A can be bounded away from 


zero only if pa 4. <- This observation is fundamental and can be used for 
global convergence analysis of nonlinear conjugate gradient algorithms. For any 
conjugate gradient method with strong Wolfe line search (1.4) and (1.6) the 
following general result holds [28]. 


Proposition 4.3. Suppose that the assumptions (i) and (ii) hold and consider any 


conjugate gradient algorithm (1.2) where dq, is a descent direction and @* is 
obtained by the strong Wolfe line search (1.4) and (1.6). If 


Y=, 


Ha, F me 
then 
limint |g, |=0. ” 


For uniformly convex functions we can prove that the norm of the direction diss 
generated by (2.2) and (2.11)-(2.12) is bounded above. Therefore by proposition 
4.3 we can prove the following result. 


Theorem 4.1. Suppose that the assumptions (i) and (ii) hold and consider the 
algorithm (1.2) and (2.2) with (2.11) and (2.12) with O=1) where d, is a descent 
direction and *« is computed by the Wolfe line search (1.4) and (1.5). Suppose 


that Vf satisfies the Lipschitz condition and Sis a uniformly convex function on 


S, ie. there exists a constant /4>0 such that 


(Vf (x) - VF (9) ey) = wl x— yf (4.5) 


lim||g, || = 0. 


forall x y€N, then (4.6) 


Proof. From Lipschitz continuity we have IS £lsll. On the other hand, from 


T 2 
uniform convexity it follows that ¥*«% > mls, - Now, using the Cauchy 
inequality, from Lipschitz continuity and uniform convexity we have: 
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T T T T 
Ve Sirsa Ve Ve . SS xsi Va Sk 


T T 
Vee Sk Is; Sk 


6, ~ Ye Sea = 


<Hbvell , Pbs bball. Pes 
as sel # 


+TL|s,| 


TL 
-(+r)sI- = Mlsch 
W 


(4.7) 


On the other hand for @=! we have 


YE Bev ~ O87 Bev] SVE Seva] + [Se Sef STL +D|s,|=Mls — a.g) 
From uniform _ convexity and Cauchy inequality observe _ that 
Hse < ves. Selle ie. 

Als. | <li (4.9) 
From (2.11) using (4.7), (4.8) and (4.9) with @=1 we get: 
w ~ He aa Ye Sea —5r Sua 


ves, 
“Ol A Malel sa Maes ge alse 


+ (YEN — Ye Ses 


la, | = 


1 
__y 
art ap 


1 


a 


Jou 'M : =M, : 
| || es (4.10) 


Now, from (2.12) using (4.7), (4.8) and (4.9) with O=1 we get: 


|< ly a + _|y7s,[o, - — Vi Sit +y,| lye Bint geal 
1 1 2 1 1 
Yells lis M,|s,|| <| =e Mill + FT Mall | 
<] ri [vellhs. [4 lhsx [+ i; alls. | pt a pa 
-|4.% Degg ool 
i Ht Jo [>| (4.11) 
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Therefore, from (2.2) 
eesall S[icall + [abs ll+ 2 [lye] < +, +My, 


showing that (4.3) is true. From proposition 4.3 it follows that (4.4) is true, which 
for uniformly convex functions is equivalent to (4.6). = 


Convergence analysis for general nonlinear functions exploits the assumptions (i) 


and (1), as well as the fact that by the Wolfe line search yy 5, >0 (strictly) and 
therefore it can be bounded from below by a positive constant, Le. there exists 


T 
T>0 such that 4%; 2T- 


Proposition 4.4. [f the search direction d,. is a descent one, then by the Wolfe line 


be 7 T 
search condition (1.5) there exists a constant * * 0 such that YiS 27: 


Proof. Since 4 is a descent direction then Vf(%,)' 4, <0. Consider that % is 
chosen to satisfy the second Wolfe line search condition (1.5). Then 


Vi (%, +a,d,)' d, 2 Ove (x,) d, > Vf (x,)' d,» 


(4.12) 
since 
0<0<1 ang Whe <0, 
Accordingly, 
Vf (x, +a,d,)' d, —Vf (x,)" d, = (6 -DVF (x,)" d,- (4.13) 
Therefore, from (4.13) we get: 
(Vf (x, +a,d,)-VF(x,))' d, >0 (4.14) 


Le. yy >0, 


Now, if in the above inequality (4.13) we let O, approach zero, than the left-hand 
side approaches zero, while the right-hand side remains constant at the value 


T 
(F-DVFO) ah 20 which is impossible. Thus, the Wolfe line search prevents 
arbitrarily small choices of @: Therefore, from (4.14), — since 

= a — T 
Se = Mea Me = Hd, it follows that >**% a (strictly). Besides, by the 
Archimedean property of the real numbers, there always exists a positive constant 


T 
T | arbitrarily small, such that 9% % aT 
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Theorem 4.2. Suppose that the assumptions (i) and (ii) hold and consider the 
algorithm (1.2) and (2.2) with (2.11) and (2.12) with @=1) where dy is a descent 
direction, * is computed by the Wolfe line search (1.4)-(1.5) and there exists a 


constant T >9 such that Ye 27 for any 21. Then 

liming | g,{]=0. ais 

Proof. Since 8,5, <0 for any Kit follows that 5, Bust = Vee + BES < Ve By 
the assumptions (i) and (ii) it follows that 

>] = 18:1 — Sell = [MF + 4.) — VE) S Ll, | $ 2BL. (4.16) 

#0 


Suppose that Bk for all 2! otherwise a stationary point is obtained. Now, 
using the standard assumptions (i) and (ii) we have 


es ve Ben VEY | 


Sree]. byl lye F ves 
is lve s,| 7 


+ 
sr ; [sf 


a= Ve Stat 


BLT |yel , Wvel/ bel? (2a )p l 
k 


r IP 
=M,ly,|- (4.17) 
On the other hand, 
Me Sent ~ St Ses] S [Me Seas] + [Se Seal S [Pe eu] +7 %| 
<(C+2B)ly,|. (4.18) 


From (2.11) using (4.17) and (4.18) we get: 


2 aS, 
nye) | [ves 


la,| + (YE 9), ie 


2 
Yor T 
| Ie Seat ~ 5 Be 


2 1 
<< (1 +2B)|y,|+—+M, |; 
IY s,| Iyel| 


<2 (7 +2B)L2B+M, = M,. 
& (4.19) 
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On the other hand, from (2.12) using again (4.17) and (4.18) we have: 


o/s 1 _|y7s,o, - — Ve Salt |e Wye S ean — geal 
Ill 
Ve 

1 

+ _lyPs,|Mye|? + +28) ||| § lulls Io Sg 
“| a] I> [yl ‘ 
= (2BM ,+ 1 +2B)— poe 
bl” I> (4.20) 


Therefore, from (2.2) 


[dasa S[coall False + [bil] s+ 28M, +M,. (4.21) 


Now, from proposition 4.3 it follows that (4.15) is true. = 
Numerical experiments and discussions 


In this section we report some numerical results obtained with an implementation 
of the TTSCAL algorithm. The code is written in Fortran and compiled with {77 
(default compiler settings) on a Workstation Intel Penttuum 4 with 1.8 GHz. We 
selected a number of 80 large-scale unconstrained optimization test functions in 
generalized or extended form we presented in [4]. For each test function we have 
taken ten numerical experiments with the number of variables increasing as 
n=1000, 2000,...,10000. The TISCAL algorithm implements the Wolfe line 
search conditions with cubic interpolation, @=0.0001, 0=0.8 and the same 


-6 
stopping criterion lg ale < 10”, where lL is the maximum absolute component of 
a vector. In all the algorithms we considered in this numerical study the maximum 
number of iterations is limited to 10000. All algorithms implement the Powell 


|> 0.2Ie.a[ 


T 
restart technology, ie. when leiu8e , then the search direction is set 


to the negative gradient. 


The comparisons of algorithms are given in the following context. Let i" and 


ALG2 ‘ 
Si be the optimal value found by ALG1 and ALG2, for problem /=1,...,800, 
respectively. We say that, in the particular problem /; the performance of ALGI 
was better than the performance of ALG2 if: 


| ae Se eal S 10° (5.1) 


and the number of iterations (#iter), or the number of function-gradient 
evaluations (#fg), or the CPU time of ALGI was less than the number of 
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iterations, or the number of function-gradient evaluations, or the CPU time 
corresponding to ALG2, respectively. In this numerical study we declare that an 
algorithm solved a particular problem if the final point obtained had the lowest 


functional value among the tested algorithms (up to 10° tolerance as it was 
specified in (5.1). This criterion is acceptable for users who are interested in 
minimizing functions and not in finding critical points. 


In the first set of numerical experiments we compare TISCAL versus ASCALCG 
[2, 3], CONMIN [35], AHYBRIDM [6], CG-DESCENT [18], THREECG [9] and 
TICG [8]. 


ASCALCG, elaborated by Andrei [2, 3], is an accelerated scaled conjugate 
gradient algorithm using a double update scheme embedded in the restart 
philosophy of Beale-Powell. The basic idea of ASCALCG is to combine the 
scaled memoryless BFGS method and the preconditioning technique in the frame 
of conjugate gradient method. The preconditioner, which is also a_ scaled 
memoryless BFGS matrix is reset when the Beale-Powell restart criterion holds. 
The parameter scaling the gradient is selected as a_ spectral gradient 


pada T 
Gest = Se Se! Vi Se: The search direction is computed as a double quasi-Newton 


updating scheme as: 


k? 


A Spe (iu18, Wt (Bp WS, fi oP jase . 
Ve Sk (5.2) 


T T 
VS Vi Sx 


where V= 4.18% and W=A,¥, and H,,, is the BFGS approximation to the 
0 


inverse Hessian initialized with the identity matrix and scaled by the scalar “+! at 
the 7—"" iteration where the Beale-Powell restart test is satisfied: 
ys +sy. yy, \s.st 
A, a 0.1 9,41 5 T = =e t + 4 aoe 
y,S; y,S; y,S; (5.3) 


The restart direction is computed as Diy = Qe Sev where Qt is exactly the 
BFGS quasi-Newton matrix, and at every step the approximation of the inverse 


Hessian is the identity matrix multiplied by the scalar Q.sirive. 


T i T T i 
15x a1 + 
Ay = Oi Sens + a. | : }s ~ (\ Ae1 ‘eM Sisk a eat Sut 


T T 
Sk Se J Ve Sk Ve Se | 


nA 


ke 


(5.4) 


For the step-length computation the algorithm implements the Wolfe line search 
conditions in the same manner as in CONMIN. 
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CONMIN, established by Shanno and Phua [35], (see also [33, 34]) is a conjugate 
gradient algorithm which may be interpreted as a memoryless BFGS 
quasi-Newton algorithm optimally scaled in the sense of Oren and Spedicato [29]. 
InCONMIN the scaling is combined with Beale-Powell’s restart criterion. 


The direction deat in CONMIN is computed as: 
Gui = FAL, 8a + A, Y, — B.S, (5.5) 


where /;,; is BFGS approximation of the inverse Hessian which at every 
iteration is initialized with identity matrix, and 4, and 8, are specific matrices. 
The main drawback of this method is that if Aus contains useful information 


about the inverse Hessian of the function ‘> then we are better off using the 


dys =—-H 


search direction k18k+1 since the addition of the last terms in (5.5) may 


prevent the direction 4h from being a descent direction unless the line search is 
sufficiently accurate. 


AHYBRIDM, elaborated by Andrei [6], is an accelerated hybrid conjugate 
gradient algorithm in which the search direction is computed as a convex 
combination of Hestenes-Stiefel [19] and Dai-Yuan [13] conjugate gradient 
algorithms. The parameter in this convex combination is computed in such a way 
the direction corresponding to the conjugate gradient algorithm is the best 


(5,,.Y,) 


direction we know, ie. the Newton direction, while the pair satisfies the 


modified secant equation By iSk = Ze» where 
2 
X= Ye +h is; is = 2: Tia) * (Bet Bea) 8 (5.6) 


CG-DESCENT was elaborated by Hager and Zhang [18] in order to ensure 
sufficient descent, imdependent by the accuracy of the line search In 


CG_DESCENT the search direction Ai =— Bea + Be 5 , where 
2 Le 
PP 2 Pl 5, iar 
Ve Sy Ve Se 


2 
satisfies the sufficient descent condition 8: S-(7/8)|g, || - Mainly, 
CG-DESCENT is a modification of the HS algorithm in such a way when iterates 


2 
jam the expression (elf Gr D/OUS)” in the formulation of 4c” from (5.7) 


(5.7) 
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becomes negligible. This modification of the HS scheme makes CG-DESCENT to 
perform better than HS [18]. 


THREECG, written by Andrei [9], is a simple three-term conjugate gradient 
algorithm which consists of a modification of the HS or of CG-DESCENT in such 
a way that the search direction is descent and it satisfies the conjugacy condition. 
These properties are independent by the line search. The direction 4i+ is 


computed as Diy = Sir — Sk — Mk» where 


Oo, -{ lal te _He8ea _Si8en 


T T 7 = 
S S S k T 
Ved) Vie Sk Ve Sk Vise 


(5.8) 


Also, the algorithm could be considered as a simple modification of the 
memoryless BFGS quasi-Newton method [9]. 


TTICG, also written by Andrei [8], is a three-term conjugate gradient algorithm, 
which is a modification of the Hestenes and Stiefel [19] or a modification of the 
CG_DESCENT by Hager and Zhang [18] algorithms, for which both the descent 
condition and the conjugacy condition are simultaneously satisfied. 


The algorithm is given by (1.2) where the direction diss is computed as 
Dig = Ser — Sk — MI > where 


}; ={real ges 2s i Sea 


Vee | Vee VM Ik yi 5, (5.9) 


Intensive numerical experiments showed that TTCG 1s clearly more efficient and 
slightly more robust than THREECG [8]. 


Figure 1 shows the Dolan and Moré [15] CPU performance profile of TTSCAL 
versus these conjugate gradient algorithms. 


In a performance profile plot, the top curve corresponds to the method that solved 
the most problems in a time that was within a given factor of the best time. 


The percentage of the test problems for which a method is the fastest is given on 
the left axis of the plot. 


The right side of the plot gives the percentage of the test problems that were 
successfully solved by these algorithms, respectively. Mainly, the right side is a 
measure of the robustness of an algorithm. 


When comparing TTSCAL with all these conjugate gradient algorithms subject to 
CPU time metric we see that TTSCAL is top performer. 
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The three-term accelerated conjugate gradient algorithm TISCAL is more 
successful and more robust than all these conjugate gradient algorithms. For 
example, comparing TISCAL versus CG-DESCENT (see Figure 1), subject to 
the number of iterations, we see that TTSCAL was better in 645 problems (ie. it 


achieved the mintmum number of iterations in 645 problems). 
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and TTCG, subject to CPU time metric. 


Fig. 1. TTSCAL versus ASCALCG, CONMIN, AHYBRIDM, CG-DESCENT, THREECH 
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CG-DESCENT was better in 71 problems and they achieved the same number of 
iterations in 54 problems, etc. 


Out of 800 problems, only for 770 problems does the criterion (5.1) hold. 
Therefore in comparison with CG-DESCENT, TISCAL appears to generate the 
best search direction and the best step-length, on average. It is known that, besides 
the conjugate gradient methods, for large-scale unconstrained optimization, two 
other methods can be successfully tried: the limited memory BFGS method 
(L-BFGS) and the discrete truncated-Newton method (TN). Both these methods 
use a low and predictable amount of storage, requiring only the function and its 
gradient values at each iterate. Both methods have been intensive tested on large 
problems of different types and their performance appears to be satisfactory [25]. 


Therefore, in the second set of numerical experiments we compare TTSCAL 
versus LBFG (m = 5) [27] and TN [24]. 

L-BFGS is an adaptation of the BFGS method for solving large-scale problems. In 
BFGS method the search direction is computed as Qi =A 8k» where Aes 
is an approximation to the inverse Hessian matrix of f , updated as 


T 
SSh 
T ° 


VS (5.10) 


Ay = V. AY, = 


and Vi =I -(y,5; (7 5_)- 


In L-BFGS method, instead of forming the matrices A, , a number of m vectors 


S,and x that define them implicitly are saved, as is described in [27]. 
The numerical experience with L-BFGS method, reported in [17], indicates that 
values of /” in the range 3<m<7 give the best results. Therefore, in this paper 
we consider the value m=5. The line-search is performed by means of the 
routine CVSRCH by Moré and Thuente [22] which uses cubic interpolation. 


The TN method is described by Nash [24]. At each outer iteration of the TN 
method, for determination of the search direction 4 an approximate solution of 


the Newton system Vf (%,)d, =—8; is found using a number of inner iterations 
based on a preconditioned linear conjugate gradient method. The matrix-vector 
products required by the mner conjugate gradient algorithm are computed by 
finite differencing, 1e. the Hessian matrix is not explicitly computed. 


Besides, the conjugate gradient inner iteration is preconditioned by a scaled two- 
step limited memory BFGS method with Powell’s restarting strategy used to reset 
the preconditioner periodically. In TN the line search is performed using the 
strong Wolfe conditions. 
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Figure 2 presents the Dolan and Moré CPU performance profiles of TTSCAL 
versus L-BFGS (m =5) and TN, respectively. 
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Fig. 2. TTSCAL versus LBFGS (m=5) and TN, subject to CPU time metric. 


Observe that TTSCAL is more efficient and more robust versus both LBFGS 
(m=5) and TN algorithms. These algorithms are different in many respects. The 
principles on which these algorithms are based are very different. The lmear 
algebra in LBFGS and TN codes to update the search direction is more time 
consuming than the linear algebra in TTSCAL. This is the main reason why 
TISCAL is more efficient and more robust in this numerical study. 


2. Conclusions 


Three-term conjugate gradient algorithms represent one of the most important 
developments in large scale unconstrained optimization. In this paper the search 


direction is selected as a linear combination of ~8x+i1> Sx and Ye» where the 
coefficients in this combination are selected to minimize the quadratic model of 
the minimizing function in which the symmetrical approximation of the Hessian 
matrix satisfies the general quasi-Newton equation. The parameter in general 
quasi-Newton equation is determined by the formal equality between the search 
direction used in the algorithm and the Newton direction. The algebraic 
developments prove that the best value of this parameter is equal to 1. This 
mechanism for the search direction computation proved to be very effective both 
subject to the efficiency and to the robustness of the algorithm. Numerical 
experiments using a large collection of 800 large-scale unconstrained optimization 
test problems showed that the suggested three-term conjugate gradient algorithm 
is both more efficient and more robust than some known conjugate gradient 
algorithms as well as than the limited memory quasi-Newton LBFGS and _ the 
discrete truncated-Newton methods. 
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